Internet Surfer 2.0

home *** CD-ROM | disk | FTP | other *** search

/ Internet Surfer 2.0 / Internet Surfer 2.0 (Wayzata Technology) (1996).iso / pc / text / mac / faqs.205 < prev next >

Wrap

Text File | 1996-02-12 | 27.9 KB | 647 lines

Frequently Asked Questions (FAQS);faqs.205 Critical Facts about the PKZIP Trojan CIAC has learned that two bogus versions of the popular archiving utility PKZIP for PC-DOS and MS-DOS machines are being circulated on several BBSs around the country. The two bogus versions of PKZIP are, 2.01 (PKZ201.ZIP and PKZ201.EXE) and 2.2 (PKZIPV2.ZIP and PKZIPV2.EXE). If you have downloaded any of these files, do not attempt to use them. You risk the destruction of all the data on your hard disk if you do. At the current time, the released version of PKZIP is version 1.10. A new version of PKZIP is expected to be released in the next few months. Its version number was planned to be 2.00, but may be increased to a number greater than 2.2 to prevent confusion with the bogus versions. PKWARE Inc. has indicated it will never issue a version 2.01 or 2.2 of PKZIP. A good copy of the latest version of PKZIP can always be gotten from the PKWARE BBS listed below. According to PKWARE Inc. version 2.01 is a hacked version of PKZIP 1.93 Alpha. While this version does not intentionally do any damage, it is alpha level software, and may have serious bugs in it. Version 2.2 is a simple batch file that attempts to erase your C:\ and C:\DOS directories. If your hard disk has been erased by this program, you may be able to recover it using hard disk undelete utilities such as those in Norton Utilities, or PCTools. Don't do anything that might create or expand a file on your hard disk until you have undeleted the files, as you may overwrite the deleted files which will destroy them. To examine a file to see if it is version 2.2, type it to the screen with the DOS TYPE command. If the file that prints on the screen is a short batch file with commands such as DEL C:\*.*, or DEL C:\DOS\*.* then you have the bogus file. For additional information or assistance, please contact CIAC: CIAC at (510) 422-8193/(FTS) FAX (510) 423-8002/(FTS) send e-mail to ciac@llnl.gov. ------------------------------------------------------------------------------ Subject: [4] What is an archiver? There is a distinction between archivers and other compression programs: - an archiver takes several input files, compresses them and produces a single archive file. Examples are arc, arj, lha, zip, zoo. - other compression programs create one compressed file for each input file. Examples are freeze, yabba, compress. Such programs are often combined with tar to create compressed archives (see question 50: "What is this tar compression program?"). ------------------------------------------------------------------------------ Subject: [5] What is the best general purpose compression program? The answer is: it depends. (You did not expect a definitive answer, did you?) It depends whether you favor speed, compression ratio, a standard and widely used archive format, the number of features, etc... Just as for text editors, personal taste plays an important role. compress has 4 options, arj 2.30 has about 130 options; different people like different programs. *Please* do not start or continue flame wars on such matters of taste. The only objective comparisons are speed and compression ratio. Here is a short table comparing various programs on a 33Mhz Compaq 386. All programs have been run on Unix SVR4, except pkzip and arj which only run on MSDOS. Detailed benchmarks have been posted in comp.compression by Peter Gutmann <pgut1@cs.aukuni.ac.nz>. *Please* do not post your own benchmarks made on your own files that nobody else can access. If you think that you must absolutely post yet another benchmark, make sure that your test files are available by anonymous ftp. The programs compared here were chosen because they are the most popular or because they run on Unix and source is available. For ftp information, see above. Two programs (hpack and comp-2) have been added because they achieve better compression (at the expense of speed) and one program (lzrw3-a) has been added because it favors speed at the expense of compression: - comp-2 is in wuarchive.wustl.edu:/mirrors/msdos/ddjmag/ddj9102.zip (inner zip file nelson.zip), - hpack is in wuarchive.wustl.edu:/mirrors/misc/unix/hpack75a.tar-z and garbo.uwasa.fi:/unix/arcers/hpack75a.tar.Z - ftp.adelaide.edu.au:/pub/compression/lzrw3-a.c [129.127.40.3] The 14 files used in the comparison are from the standard Calgary Text Compression Corpus, available by ftp on fsa.cpsc.ucalgary.ca [136.159.2.1] in /pub/text.compression.corpus/text.compression.corpus.tar.Z. The whole corpus includes 18 files, but the 4 files paper[3-6] are generally omitted in benchmarks. It contains several kinds of file (ascii, binary, image, etc...) but has a bias towards large files. You may well get different ratings on the typical mix of files that you use daily, so keep in mind that the comparisons given below are only indicative. The programs are ordered by decreasing total compressed size. For a fair comparison between archivers and other programs, this size is only the size of the compressed data, not the archive size. The programs were run on an idle machine, so the elapsed time is significant and can be used to compare Unix and MSDOS programs. [Note: I still have to add all decompression times.] size lzrw3a compress lharc yabba pkzip freeze version: 4.0 1.02 1.0 1.10 2.3.5 options: -m300000 ------ ----- ------ ------ ------ ------ ------ bib 111261 49040 46528 46502 40456 41354 41515 book1 768771 416131 332056 369479 306813 350560 344793 book2 610856 274371 250759 252540 229851 232589 230861 geo 102400 84214 77777 70955 76695 76172 68626 news 377109 191291 182121 166048 168287 157326 155783 obj1 21504 12647 14048 10748 13859 10546 10453 obj2 246814 108040 128659 90848 114323 90130 85500 paper1 53161 24522 25077 21748 22453 20041 20021 paper2 82199 39479 36161 35275 32733 32867 32693 pic 513216 111000 62215 61394 65377 63805 53291 progc 39611 17919 19143 15399 17064 14164 14143 progl 71646 24358 27148 18760 23512 17255 17064 progp 49379 16801 19209 12792 16617 11877 11686 trans 93695 30292 38240 28092 31300 23135 22861 3,141,622 1,400,105 1,259,141 1,200,580 1,159,340 1,141,821 1,109,290 real 0m35s 0m59s 5m03s 2m40s 5m27s user 0m25s 0m29s 4m29s 1m46s 4m58s sys 0m05s 0m10s 0m07s 0m18s 0m08s MSDOS: 1m39s zoo lha arj pkzip zip hpack comp-2 2.10 1.0(Unix) 2.30 1.93a 1.9 0.75a ah 2.13(MSDOS) -jm -ex -6 ------ ------ ------ ------ ------- ------ ------ bib 40742 40740 36090 35186 34950 35619 29840 book1 339076 339074 318382 313566 312619 306876 237380 book2 228444 228442 210521 207204 206306 208486 174085 geo 68576 68574 69209 68698 68418 58976 64590 news 155086 155084 146855 144954 144395 141608 128047 obj1 10312 10310 10333 10307 10295 10572 10819 obj2 84983 84981 82052 81213 81336 80806 85465 paper1 19678 19676 18710 18519 18525 18607 16895 paper2 32098 32096 30034 29566 29674 29825 25453 pic 52223 52221 53578 52777 55051 51778 55461 progc 13943 13941 13408 13363 13238 13475 12896 progl 16916 16914 16408 16148 16175 16586 17354 progp 11509 11507 11308 11214 11182 11647 11668 trans 22580 22578 20046 19808 18879 20506 21023 1,096,166 1,096,138 1,036,934 1,022,523 1,021,043 1,005,367 890,976 real 4m07s 6m03s 1m49s 1h22m17s 27m05s user 3m47s 4m23s 1m43s 1h20m46s 19m27s sys 0m04s 0m08s 0m02s 0m12s 2m03s MSDOS: 1m49s 2m41s 1m55s Notes: - the compressed data for 'zoo ah' is always two bytes longer than for lha. This is simply because both programs are derived from the same source (ar002, written by Haruhiko Okumura, available by ftp in wuarchive.wustl.edu:/mirrors/msdos/arc_lbr/ar002.zip). - hpack 0.75a gives slightly different results on SunOS (undeterministic behaviour still under investigation). - the MSDOS versions are all optimized with assembler code and were run on a RAM disk. So it is not surprising that they often go faster than their Unix equivalent. ------------------------------------------------------------------------------ Subject: [7] Which books should I read? [BWC 1989] Bell, T.C, Witten, I.H, and Cleary, J.G. "Text Compression", Prentice-Hall 1989. ISBN: 0-13-911991-4. Price: approx. US$40 The reference on text data compression. [Nel 1991] Mark Nelson, "The Data Compression Book" M&T Books, Redwood City, CA, 1991. ISBN 1-55851-216-0. Price $36.95 including two 5" PC-compatible disks bearing all the source code printed in the book. A practical introduction to data compression. The book is targeted at a person who is comfortable reading C code but doesn't know anything about data compression. Its stated goal is to get you up to the point where you are competent to program standard compression algorithms. [Will 1990] Williams, R. "Adaptive Data Compression", Kluwer Books, 1990. ISBN: 0-7923-9085-7. Price: US$75. Reviews the field of text data compression and then addresses the problem of compressing rapidly changing data streams. [Stor 1988] Storer, J.A. "Data Compression: Methods and Theory", Computer Science Press, Rockville, MD. ISBN: 0-88175-161-8. A survey of various compression techniques, mainly statistical non-arithmetic compression and LZSS compression. Includes complete Pascal code for a series of LZ78 variants. [ACG 1991] Advances in Speech Coding, edited by Atal, Cuperman, and Gersho, Kluwer Academic Press, 1991. [GG 1991] Vector Quantization and Signal Compression, by Gersho and Gray, Kluwer Acad. Press, 1991 [CT 1991] Elements of Information Theory, by T.M.Cover and J.A.Thomas John Wiley & Sons, 1991. Review papers: [BWC 1989] Bell, T.C, Witten, I.H, and Cleary, J.G. "Modeling for Text Compression", ACM Computing Surveys, Vol.21, No.4 (December 1989), p.557 A good general overview of compression techniques (as well as modeling for text compression); the condensed version of "Text Compression". [Lele 1987] Lelewer, D.A, and Hirschberg, D.S. "Data Compression", ACM Computing Surveys, Vol.19, No.3 (September 1987), p.261. A survey of data compression techniques which concentrates on Huffman compression and makes only passing mention of other techniques. ------------------------------------------------------------------------------ Subject: [8] What about patents on data compression algorithms? [Note: the appropriate group for discussing software patents is comp.patents (or misc.legal.computing), not comp.compression.] All patents mentioned here are US patents, and thus probably not applicable outside the US. See item 70, "Introduction to data compression" for the meaning of LZ77, LZ78 or LZW. (a) Run length encoding - Tsukiyama has two patents on run length encoding: 4,586,027 and 4,872,009 granted in 1986 and 1989 respectively. The first one covers run length encoding in its most primitive form: a length byte followed by the repeated byte. The second patent covers the 'invention' of limiting the run length to 16 bytes and thus the encoding of the length on 4 bits. Here is the start of claim 1 of patent 4,872,009, just for pleasure: 1. A method of transforming an input data string comprising a plurality of data bytes, said plurality including portions of a plurality of consecutive data bytes identical to one another, wherein said data bytes may be of a plurality of types, each type representing different information, said method comprising the steps of: [...] (b) LZ77 - The Gibson & Graybill patent 5,049,881 covers the LZRW1 algorithm previously discovered by Ross Williams. (See item 5 for the ftp site with all LZRW derivatives). Claims 4 and 12 are very general and could be interpreted as applying to any LZ algorithm using hashing (including all variants of LZ78): 4. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data string to be compressed, said compression method comprising the creation of a hash table, hashing each occurrence of a string of input data and subsequently searching for identical strings of input data and if such an identical string of input data is located whose string size is at least equal to the minimum compression size selected, compressing the second and all subsequent occurrences of such identical string of data, if a string of data is located which does not match to a previously compressed string of data, storing such data as uncompressed data, and for each input strings after each hash is used to find a possible previous match location of the string, the location of the string is stored in the hash table, thereby using the previously processed data to act as a compression dictionary. Claim 12 is identical, with 'method' replaced with 'apparatus'. Since the 'minimal compression size' can be as small as 2, the claim could cover any dictionary technique of the LZ family. However the text of the patent and the other claims make clear that the patent should cover the LZRW1 algorithm only. The following papers, published before the patent was filed, describe applications of hashing to LZ77 compression: Brent, R.P. "A Linear Algorithm for Data Compression", Australian Computer Journal, Vol.19, No.2 (May 1987), p.64. Bell, T. "Longest match string searching for Ziv-Lempel compression" Res. Rept. 6/89, Dept. of Computer Science, Univ. of Canterbury, New Zealand (Feb 89). - Phil Katz, author of pkzip, also has a patent on LZ77 (5,051,745) but the claims only apply to sorted hash tables, and when the hash table is substantially smaller than the window size. - Robert Jung, author of 'arj', has recently been granted patent 5,140,321 for one variation of LZ77 with hashing. This patent covers the LZRW3-A algorithm, also previously discovered by Ross Williams. LZRW3-A was posted on comp.compression on July 15, 1991. The patent was filed two months later on Sept 4, 1991. (The US patent system allows this because of the 'invention date' rule.) - Fiala and Greene obtained in 1990 a patent (4,906,991) on all implementations of LZ77 using a tree data structure. Claim 1 of the patent is much broader than the algorithms published by Fiala and Greene in Comm.ACM, April 89. The patent covers the algorithm published by Rodeh and Pratt in 1981 (J. of the ACM, vol 28, no 1, pp 16-24). It also covers the algorithm previously patented by Eastman-Lempel-Ziv (4,464,650), and the algorithms used in lharc, lha and zoo. - IBM patented (5,001,478) the idea of combining a history buffer (the LZ77 technique) and a lexicon (as in LZ78). (c) LZ78 - The LZW algorithm used in 'compress'is patented by IBM (4,814,746) and Unisys (4,558,302). It is also used in the V.42bis compression standard (see question 11 on V.42bis below) and in Postscript Level 2. (Unisys sells the license to modem manufacturers for a onetime $25,000 fee.) The IBM patent application was filed three weeks before that of Unisys, but the US patent office failed to recognize that they covered the same algorithm. (The IBM patent is more general, but its claim 7 is exactly LZW.) - AP coding is patented by Storer (4,876,541). (Get the yabba package for source code, see question 2 above, file type .Y) (d) other data compression algorithms - IBM holds a patent on the Q-coder implementation of arithmetic coding. The arithmetic coding option of the JPEG standard requires use of the patented algorithm. No JPEG-compatible method is possible without infringing the patent, because what IBM actually claims rights to is the underlying probability model (the heart of an arithmetic coder). (See the JPEG FAQ for details.) - Bacon has patented (4,612,532) some from of Markov modeling. As can be seen from the above list, *all* the most popular compression programs (compress, pkzip, zoo, lha, arj) are now covered by patents. (This says nothing about the validity of these patents.) Here are some references on data compression patents. A number of them are taken from the list maintained by Michael Ernst <mernst@theory.lcs.mit.edu> in mintaka.lcs.mit.edu:/mitlpf/ai/patent-list (or patent-list.Z). 4,464,650 Apparatus and method for compressing data signals and restoring the compressed data signals inventors Lempel, Ziv, Cohn, Eastman assignees Sperry Corporation and At&T Bell Laboratories filed 8/10/81, granted 8/7/84 4,558,302 High speed data compression and decompression apparatus and method inventor Welch assignee Sperry Corporation (now Unisys) filed 6/20/83, granted 12/10/85 The text for this patent can be ftped from rusmv1.rus.uni-stuttgart.de (129.69.1.12) in /info/comp.patents/US4558302.Z. 4,586,027 Method and system for data compression and restoration assignee Hitachi, inventor Tsukimaya et al. filed 08/07/84, granted 04/29/86 4,612,532 inventor Bacon granted 9/1986 4,814,746 Data compression method inventors Victor S. Miller, Mark N. Wegman assignee IBM filed 8/11/86, granted 3/21/89 A previous application was filed on 6/1/83, three weeks before the application by Welch (4,558,302) 4,872,009 Method and apparatus for data compression and restoration assignee Hitachi, inventor Tsukimaya et al. filed 12/07/87, granted 10/03/89 4,876,541 Stem [sic] for dynamically compressing and decompressing electronic data inventor James A. Storer assignee Data Compression Corporation filed 10/15/87, granted 10/24/89 4,955,066 Compressing and Decompressing Text Files inventor Notenboom, L.A. assignee Microsoft filed 10/13/89, granted 09/04/90 5,001,478 Method of Encoding Compressed Data filed 12/28/89, granted 03/19/91 inventor Michael E. Nagy assignee IBM 5,049,881 Apparatus and method for very high data rate-compression incorporating lossless data compression and expansion utilizing a hashing technique inventors Dean K. Gibson, Mark D. Graybill assignee Intersecting Concepts, Inc. filed 6/18/90, granted 9/17/91 5,051,745 String searcher, and compressor using same inventor Phillip W. Katz (author of pkzip) filed 8/21/90, granted 9/24/91 4,906,991 Textual substitution data compression with finite length search window inventors Fiala,E.R., and Greene,D.H. filed 4/29/1988, granted 3/6/1990 assignee Xerox Corporation 5,109,433 Compressing and decompressing text files assignee Microsoft 5,140,321 Data compression/decompression method and apparatus filed 9/4/91, granted 8/18/92 inventor Robert Jung assignee Prime Computer ------------------------------------------------------------------------------ Subject: [9] The WEB 16:1 compressor. [WARNING: this topic has generated the greatest volume of news in the history of comp.compression. Read this before posting on this subject.] (a) What the press says April 20, 1992 Byte Week Vol 4. No. 25: "In an announcement that has generated high interest - and more than a bit of skepticism - WEB Technologies (Smyrna, GA) says it has developed a utility that will compress files of greater than 64KB in size to about 1/16th their original length. Furthermore, WEB says its DataFiles/16 program can shrink files it has already compressed." [...] "A week after our preliminary test, WEB showed us the program successfully compressing a file without losing any data. But we have not been able to test this latest beta release ourselves." [...] "WEB, in fact, says that virtually any amount of data can be squeezed to under 1024 bytes by using DataFiles/16 to compress its own output multiple times." June 1992 Byte, Vol 17 No 6: [...] According to Earl Bradley, WEB Technologies' vice president of sales and marketing, the compression algorithm used by DataFiles/16 is not subject to the laws of information theory. [...] (b) First details, by John Wallace <buckeye@spf.trw.com>: I called WEB at (404)514-8000 and they sent me some product literature as well as chatting for a few minutes with me on the phone. Their product is called DataFiles/16, and their claims for it are roughly those heard on the net. According to their flier: "DataFiles/16 will compress all types of binary files to approximately one-sixteenth of their original size ... regardless of the type of file (word processing document, spreadsheet file, image file, executable file, etc.), NO DATA WILL BE LOST by DataFiles/16." (Their capitalizations; 16:1 compression only promised for files >64K bytes in length.) "Performed on a 386/25 machine, the program can complete a compression/decompression cycle on one megabyte of data in less than thirty seconds" "The compressed output file created by DataFiles/16 can be used as the input file to subsequent executions of the program. This feature of the utility is known as recursive or iterative compression, and will enable you to compress your data files to a tiny fraction of the original size. In fact, virtually any amount of computer data can be compressed to under 1024 bytes using DataFiles/16 to compress its own output files muliple times. Then, by repeating in reverse the steps taken to perform the recusive compression, all original data can be decompressed to its original form without the loss of a single bit." Their flier also claims: "Constant levels of compression across ALL TYPES of FILES" "Convenient, single floppy DATA TRANSPORTATION" From my telephone conversation, I was was assured that this is an actual compression program. Decompression is done by using only the data in the compressed file; there are no hidden or extra files. (c) More information, by Rafael Ramirez <rafael.ramirez@channel1.com>: Today (Tuesday, 28th) I got a call from Earl Bradley of Web who now says that they have put off releasing a software version of the algorithm because they are close to signing a major contract with a big company to put the algorithm in silicon. He said he could not name the company due to non-disclosure agreements, but that they had run extensive independent tests of their own and verified that the algorithm works. [...] He said the algorithm is so simple that he doesn't want anybody getting their hands on it and copying it even though he said they have filed a patent on it. [...] Mr. Bradley said the silicon version would hold up much better to patent enforcement and be harder to copy. He claimed that the algorithm takes up about 4K of code, uses only integer math, and the current software implementation only uses a 65K buffer. He said the silicon version would likely use a parallel version and work in real-time. [...] (d) The impossiblity proofs. It is impossible for a given program to compress without loss *all* files greater than a certain size by at least one bit. This can be proven by a simple counting argument. (Many other proofs have been posted on comp.compression, *please* do not post yet another one.) Assume that the program can compress without loss all files of size >= N bits. Compress with this program all the 2^N files which have exactly N bits. All compressed files have at most N-1 bits, so there are at most (2^N)-1 different compressed files [2^(N-1) files of size N-1, 2^(N-2) of size N-2, and so on, down to 1 file of size 0]. So at least two different input files must compress to the same output file. Hence the compression program cannot be lossless. (Stronger results about the number of incompressible files can be obtained, but the proofs are a little more complex.) This argument applies of course to WEB's case (take N = 64K*8 bits). Note that no assumption is made about the compression algorithm. The proof applies to *any* algorithm, including those using an external dictionary, or repeated application of another algorithm, or combination of different algorithms, or representation of the data as formulas, etc... All schemes are subject to the counting argument. There is no need to use information theory to provide a proof, just basic mathematics. This assumes of course that the information available to the decompressor is only the bit sequence of the compressed data. If external information such as a file name or a number of iterations is necessary to decompress the data, the bits providing the extra information must be included in the bit count of the compressed data. (Otherwise, it would be sufficient to consider any input data as a number, use this as the iteration count or file name, and pretend that the compressed size is zero.) [See also question 73 "What is the theoretical compression limit?" in part 2 of this FAQ.] (e) No software version Appeared on BIX, reposted by Bruce Hoult <Bruce.Hoult@actrix.gen.nz>: tojerry/chaos #673, from abailey, 562 chars, Tue Jun 16 20:40:34 1992 Comment(s). ---------- TITLE: WEB Technology I promised everyone a report when I finally got the poop on WEB's 16:1 data compression. After talking back and forth for a year and being put off for the past month by un-returned phone calls, I finally got hold of Marc Spindler who is their sales manager. _No_ software product is forth coming, period! He began talking about hardware they are designing for delivery at the end of the year. [...] (f) Product cancelled Posted by John Toebes <toebes@bnr.ca> on Aug 10th, 1992: [Long story omitted, confirming the reports made above about the original WEB claims.] 10JUL92 - Called to Check Status. Was told that testing had uncovered a new problem where 'four numbers in a matrix were the same value' and that the programmers were off attempting to code a preprocessor to eliminate this rare case. I indicated that he had told me this story before. He told me that the programmers were still working on the problem. 31JUL92 - Final Call to Check Status. Called Earl in the morning and was told that he still had not heard from the programmers. [...] Stated that if they could not resolve the problem then there would probably not be a product. 03AUG92 - Final Call. Earl claims that the programmers are unable to resolve the problem. I asked if this meant that there would not be a product as a result and he said yes. (g) Conclusion The last report given above should put an end to the WEB story. [Note from the FAQ maintainer: I will keep this long story in the FAQ for a while, and will remove it when the dust has finally settled down.] ------------------------------------------------------------------------------ Subject: [11] What is the V.42bis standard? A description of the V.42bis standard is given in "The V.42bis standard for data-compressing modems," by Clark Thomborson <cthombor@theory.lcs.mit.edu>, IEEE Micro, Oct 1992, pp. 41-53. Short introduction, by Alejo Hausner <hausner@qucis.queensu.ca>: The V.42bis Compression Standard was proposed by the International Consultative Committee on Telephony and Telegraphy (CCITT) as an addition to the v.42 error-correction protocol for modems. Its purpose is to increase data throughput, and uses a variant of the Lempel-Ziv-Welch (LZW) compression method. It is meant to be implemented in the modem hardware, but can also be built into the software that interfaces to an ordinary non-compressing modem.